102 research outputs found

    Lyndon Arrays Simplified

    Get PDF
    A Lyndon word is a string that is lexicographically smaller than all of its proper suffixes (e.g., "airbus" is a Lyndon word; "amtrak" is not a Lyndon word because its suffix "ak" is lexicographically smaller than "amtrak"). The Lyndon array (sometimes called Lyndon table) identifies the longest Lyndon prefix of each suffix of a string. It is well known that the Lyndon array of a length-n string can be computed in O(n) time. However, most of the existing algorithms require the suffix array, which has theoretical and practical disadvantages. The only known algorithms that compute the Lyndon array in O(n) time without the suffix array (or similar data structures) do so in a particularly space efficient way (Bille et al., ICALP 2020), or in an online manner (Badkobeh et al., CPM 2022). Due to the additional goals of space efficiency and online computation, these algorithms are complicated in technical detail. Using the main ideas of the aforementioned algorithms, we provide a simpler and easier to understand algorithm that computes the Lyndon array in O(n) time

    Bidirectional Text Compression in External Memory

    Get PDF
    Bidirectional compression algorithms work by substituting repeated substrings by references that, unlike in the famous LZ77-scheme, can point to either direction. We present such an algorithm that is particularly suited for an external memory implementation. We evaluate it experimentally on large data sets of size up to 128 GiB (using only 16 GiB of RAM) and show that it is significantly faster than all known LZ77 compressors, while producing a roughly similar number of factors. We also introduce an external memory decompressor for texts compressed with any uni- or bidirectional compression scheme

    Lyndon Arrays in Sublinear Time

    Get PDF
    ?} with ? ? n. In this case, the string can be stored in O(n log ?) bits (or O(n / log_? n) words) of memory, and reading it takes only O(n / log_? n) time. We show that O(n / log_? n) time and words of space suffice to compute the succinct 2n-bit version of the Lyndon array. The time is optimal for w = O(log n). The algorithm uses precomputed lookup tables to perform significant parts of the computation in constant time. This is possible due to properties of periodic substrings, which we carefully analyze to achieve the desired result. We envision that the algorithm has applications in the computation of runs (maximal periodic substrings), where the Lyndon array plays a central role in both theoretically and practically fast algorithms

    Linear Time Runs Over General Ordered Alphabets

    Get PDF
    A run in a string is a maximal periodic substring. For example, the string bananatree\texttt{bananatree} contains the runs anana=(an)3/2\texttt{anana} = (\texttt{an})^{3/2} and ee=e2\texttt{ee} = \texttt{e}^2. There are less than nn runs in any length-nn string, and computing all runs for a string over a linearly-sortable alphabet takes O(n)\mathcal{O}(n) time (Bannai et al., SODA 2015). Kosolobov conjectured that there also exists a linear time runs algorithm for general ordered alphabets (Inf. Process. Lett. 2016). The conjecture was almost proven by Crochemore et al., who presented an O(nα(n))\mathcal{O}(n\alpha(n)) time algorithm (where α(n)\alpha(n) is the extremely slowly growing inverse Ackermann function). We show how to achieve O(n)\mathcal{O}(n) time by exploiting combinatorial properties of the Lyndon array, thus proving Kosolobov's conjecture.Comment: This work has been submitted to ICALP 202

    Parallel External Memory Wavelet Tree and Wavelet Matrix Construction

    Get PDF

    Lyndon Words Accelerate Suffix Sorting

    Get PDF
    Suffix sorting is arguably the most fundamental building block in string algorithmics, like regular sorting in the broader field of algorithms. It is thus not surprising that the literature is full of algorithms for suffix sorting, in particular focusing on their practicality. However, the advances on practical suffix sorting stalled with the emergence of the DivSufSort algorithm more than 10 years ago, which, up to date, has remained the fastest suffix sorter. This article shows how properties of Lyndon words can be exploited algorithmically to accelerate suffix sorting again. Our new algorithm is 6-19% faster than DivSufSort on real-world texts, and up to three times as fast on artificial repetitive texts. It can also be parallelized, where similar speedups can be observed. Thus, we make the first advances in practical suffix sorting after more than a decade of standstill

    Space Efficient Construction of Lyndon Arrays in Linear Time

    Get PDF
    Given a string S of length n, its Lyndon array identifies for each suffix S[i..n] the next lexicographically smaller suffix S[j..n], i.e. the minimal index j > i with S[i..n] ? S[j..n]. Apart from its plain (n log? n)-bit array representation, the Lyndon array can also be encoded as a succinct parentheses sequence that requires only 2n bits of space. While linear time construction algorithms for both representations exist, it has previously been unknown if the same time bound can be achieved with less than ?(n lg n) bits of additional working space. We show that, in fact, o(n) additional bits are sufficient to compute the succinct 2n-bit version of the Lyndon array in linear time. For the plain (n log? n)-bit version, we only need ?(1) additional words to achieve linear time. Our space efficient construction algorithm makes the Lyndon array more accessible as a fundamental data structure in applications like full-text indexing

    Space Efficient Construction of Lyndon Arrays in Linear Time

    Get PDF

    Search for dark matter produced in association with bottom or top quarks in √s = 13 TeV pp collisions with the ATLAS detector

    Get PDF
    A search for weakly interacting massive particle dark matter produced in association with bottom or top quarks is presented. Final states containing third-generation quarks and miss- ing transverse momentum are considered. The analysis uses 36.1 fb−1 of proton–proton collision data recorded by the ATLAS experiment at √s = 13 TeV in 2015 and 2016. No significant excess of events above the estimated backgrounds is observed. The results are in- terpreted in the framework of simplified models of spin-0 dark-matter mediators. For colour- neutral spin-0 mediators produced in association with top quarks and decaying into a pair of dark-matter particles, mediator masses below 50 GeV are excluded assuming a dark-matter candidate mass of 1 GeV and unitary couplings. For scalar and pseudoscalar mediators produced in association with bottom quarks, the search sets limits on the production cross- section of 300 times the predicted rate for mediators with masses between 10 and 50 GeV and assuming a dark-matter mass of 1 GeV and unitary coupling. Constraints on colour- charged scalar simplified models are also presented. Assuming a dark-matter particle mass of 35 GeV, mediator particles with mass below 1.1 TeV are excluded for couplings yielding a dark-matter relic density consistent with measurements
    corecore